A Dependency Treebank of Classical Chinese Poems

نویسندگان

  • John Lee
  • Yin Hei Kong
چکیده

As interest grows in the use of linguistically annotated corpora in research and teaching of foreign languages and literature, treebanks of various historical texts have been developed. We introduce the first large-scale dependency treebank for Classical Chinese literature. Derived from the Stanford dependency types, it consists of over 32K characters drawn from a collection of poems written in the 8 th century CE. We report on the design of new dependency relations, discuss aspects of the annotation process and evaluation, and illustrate its use in a study of parallelism in Classical Chinese poetry.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Glimpses of Ancient China from Classical Chinese Poems

While our knowledge about ancient civilizations comes mostly from studies in archaeology and history books, much can also be learned or confirmed from literary texts. Using natural language processing techniques, we present aspects of ancient China as revealed by statistical textual analysis on the Complete Tang Poems, a 2.6-million-character corpus of all surviving poems from the Tang Dynasty ...

متن کامل

Treebanking for Data-driven Research in the Classroom

Data-driven research in linguistics typically involves the processes of data annotation, data visualization and identification of relevant patterns. We describe our experience in incorporating these processes at an undergraduate course on language information technology. Students collectively annotated the syntactic structures of a set of Classical Chinese poems; the resulting treebank was put ...

متن کامل

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

Treebank-Based Acquisition of Chinese LFG Resources for Parsing and Generation

This thesis describes a treebank-based approach to automatically acquire robust, wide-coverage Lexical-Functional Grammar (LFG) resources for Chinese parsing and generation, which is part of a larger project on the rapid construction of deep, large-scale, constraint-based, multilingual grammatical resources. I present an application-oriented LFG analysis for Chinese core linguistic phenomena an...

متن کامل

تولید درخت بانک سازه‌ای زبان فارسی به روش تبدیل خودکار

Treebanks is one of important and useful resource in Natural Language Processing tasks. Dependency and phrase structures are two famous kinds of treebanks. There have already made many efforts to convert dependency structure to phrase structure. In this paper we study an approach to convert dependency structure to phrase structure because of lack of a big phrase structure Treebank in Persian. A...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012